Implementing a parallel matrix factorization library on the cell broadband engine
نویسندگان
چکیده
Matrix factorization (or often called decomposition) is a frequently used kernel in a large number of applications ranging from linear solvers to data clustering and machine learning. The central contribution of this paper is a thorough performance study of four popular matrix factorization techniques, namely, LU, Cholesky, QR, and SVD on the STI Cell broadband engine. The paper explores algorithmic as well as implementation challenges related to the Cell chip-multiprocessor and explains how we achieve near-linear speedup on most of the factorization techniques for a range of matrix sizes. For each of the factorization routines, we identify the bottleneck kernels and explain how we have attempted to resolve the bottleneck and to what extent we have been successful. Our implementations, for the largest data sets that we use, running on a two-node 3.2 GHz Cell BladeCenter (exercising a total of sixteen SPEs), on average, deliver 203.9, 284.6, 81.5, 243.9, and 54.0 GFLOPS for dense LU, dense Cholesky, sparse Cholesky, QR, and SVD, respectively. The implementations achieve speedup of 11.2, 12.8, 10.6, 13.0, and 6.2, respectively for dense LU, dense Cholesky, sparse Cholesky, QR, and SVD, when running on sixteen SPEs. We discuss the interesting interactions that result from parallelization of the factorization routines on a two-node non-uniform memory access (NUMA) Cell Blade cluster.
منابع مشابه
QR factorization for the Cell Broadband Engine
The QR factorization is one of the most important operations in dense linear algebra, offering a numerically stable method for solving linear systems of equations including overdetermined and underdetermined systems. Modern implementations of the QR factorization, such as the one in the LAPACK library, suffer from performance limitations due to the use of matrix–vector type operations in the ph...
متن کاملParleda: a Library for Parallel Processing in Computational Geometry Applications
ParLeda is a software library that provides the basic primitives needed for parallel implementation of computational geometry applications. It can also be used in implementing a parallel application that uses geometric data structures. The parallel model that we use is based on a new heterogeneous parallel model named HBSP, which is based on BSP and is introduced here. ParLeda uses two main lib...
متن کاملA Modified Digital Image Watermarking Scheme Based on Nonnegative Matrix Factorization
This paper presents a modified digital image watermarking method based on nonnegative matrix factorization. Firstly, host image is factorized to the product of three nonnegative matrices. Then, the centric matrix is transferred to discrete cosine transform domain. Watermark is embedded in low frequency band of this matrix and next, the reverse of the transform is computed. Finally, watermarked ...
متن کاملEvaluating the Portability of UPC to the Cell Broadband Engine
Unified Parallel C (UPC) is a parallel programming language for distributed as well as shared memory systems. The Cell Broadband Engine (Cell BE) is a state of the art multicore processor. In this paper we evaluate the opportunities and pitfalls of implementing the Berkeley UPC runtime system API for the Cell BE and thus bringing UPC to Cell. We propose a mapping of the distributed shared memor...
متن کاملA Modified Digital Image Watermarking Scheme Based on Nonnegative Matrix Factorization
This paper presents a modified digital image watermarking method based on nonnegative matrix factorization. Firstly, host image is factorized to the product of three nonnegative matrices. Then, the centric matrix is transferred to discrete cosine transform domain. Watermark is embedded in low frequency band of this matrix and next, the reverse of the transform is computed. Finally, watermarked ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Scientific Programming
دوره 17 شماره
صفحات -
تاریخ انتشار 2009